self-supervised intrinsic image decomposition
Self-Supervised Intrinsic Image Decomposition
Intrinsic decomposition from a single image is a highly challenging task, due to its inherent ambiguity and the scarcity of training data. In contrast to traditional fully supervised learning approaches, in this paper we propose learning intrinsic image decomposition by explaining the input image. Our model, the Rendered Intrinsics Network (RIN), joins together an image decomposition pipeline, which predicts reflectance, shape, and lighting conditions given a single image, with a recombination function, a learned shading model used to recompose the original input based off of intrinsic image predictions. Our network can then use unsupervised reconstruction error as an additional signal to improve its intermediate representations. This allows large-scale unlabeled data to be useful during training, and also enables transferring learned knowledge to images of unseen object categories, lighting conditions, and shapes. Extensive experiments demonstrate that our method performs well on both intrinsic image decomposition and knowledge transfer.
Reviews: Self-Supervised Intrinsic Image Decomposition
The paper presents an interesting approach on the intrinsic image decomposition problem: given an input rgb image, it decomposes it first into shape (normals), reflectance (albedo) and illumination (point light) using an encoder-decoder deep architecture with 3 outputs. Then there is another encoder-decoder that takes the predicted normals and light and outputs the shading of the shape. Finally, the result comes from a multiplication between the estimated reflectance (from the 1st encoder-decoder) with the estimated shading. The idea of having a reconstruction loss to recover the input image is interesting, but I believe that is only partially employed in the paper. The network architecture still needs labeled data for the initial training.